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PATH COUPLING AND AGGREGATE PATH COUPLING 


YEVGENIY KOVCHEGOV AND PETER T. OTTO 


Abstract. In this survey paper, we describe and characterize an extension to the classical path 
coupling method applied statistical mechanical models, referred to as aggregate path coupling. 
In conjunction with large deviations estimates, we use this aggregate path coupling method 
to prove rapid mixing of Glauber dynamics for a large class of statistical mechanical models, 
including models that exhibit discontinuous phase transitions which have traditionally been 
more difficult to analyze rigorously. The parameter region for rapid mixing for the generalized 
Curie-Weiss-Potts model is derived as a new application of the aggregate path coupling method. 
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1. Introduction 

The theory of mixing times addresses a fundamental question that lies at the heart of statistical 
mechanics. How quickly does a physical system relax to equilibrium? A related problem arises in 
computational statistical physics concerning the accuracy of computer simulations of equilibrium 
data. One typically carries out such simulations by running Glauber dynamics or the closely 
related Metropolis algorithm, in which case the theory of mixing times allows one to quantify 
the running time required by the simulation. 

An important question driving the work in the field is the relationship between the mixing 
times of the dynamics and the equilibrium phase transition structure of the corresponding sta¬ 
tistical mechanical models. Many results for models that exhibit a continuous phase transition 
were obtained by a direct application of the standard path coupling method. Standard path cou¬ 
pling [5] is a powerful tool in the theory of mixing times of Markov chains in which rapid mixing 
can be proved by showing that the mean coupling distance contracts between all neighboring 
configurations of a minimal path connecting two arbitrary configurations. 

For models that exhibit a discontinuous phase transition, the standard path coupling method 
fails. In this survey paper, we show how to combine aggregate path coupling and large deviation 
theory to determine the mixing times of a large class of statistical mechanical models, including 
those that exhibit a discontinuous phase transition. The aggregate path coupling method extends 
the use of the path coupling technique in the absence of contraction of the mean coupling distance 
between all neighboring configurations of a statistical mechanical model. The primary objective 
of this survey is to characterize the assumptions required to apply this new method of aggregate 
path coupling. 

The manuscript is organized as follows: in Section we give a brief overview of mixing 
times, coupling and path coupling methods, illustrated with a new example of path coupling. 
Then, beginning in Section we introduce the class of statistical mechanical models considered 
in the survey. In Sections and we develop and characterize the theory of aggregate path 
coupling and apply it in Section where we derive the parameter region for rapid mixing for 
the generalized Curie-Weiss-Potts model that was introduced recently in |25| . 


2. Mixing Times and Path Coupling 


The mixing time is a measure of the convergence rate of a Markov chain to its stationary 
distribution and is defined in terms of the total variation distance between two distributions /i 
and V defined by 


ll/i - u||tv = sup |/i(A) - u(A)| = 
Acfl 


\ \l^{x) - u{x)\ 


Given the convergence of the Markov chain, we define the maximal distance to stationary to be 

d{t) = max \\P^{x, •) — ttHtv 


where P*(x, •) is the transition probability of the Markov chain starting in configuration x and 
TT is its stationary distribution. Rather than obtaining bounds on d{t), it is sometimes easier to 
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bound the standardized maximal distance defined by 

(1) d{t) := max \\P\x, •) - P\y, OUtv 

x,y£il 

which satisfies the following result. 

Lemma 2.1. ( [29] Lemma 4.11) With d(t) and d{t) defined above, we have 

d{t) < d{t) < 2 d{t). 

Given e > 0, the mixing time of the Markov chain is defined by 

imix(e) = min{f : d{t) < e) 

In the modern theory of Markov chains, the interest is in the mixing time as a function of the 
system size n and thus, for emphasis, we will often use the notation With only a handful 

of general techniques, rigorous analysis of mixing times is difficult and the proof of exact mixing 
time asymptotics (with respect to n) of even some basic chains remain elusive. See [29] for a 
survey on the theory of mixing times. 

Rates of mixing times are generally categorized into two groups: rapid mixing which implies 
that the mixing time exhibits polynomial growth with respect to the system size, and slow mixing 
which implies that the mixing time grows exponentially with the system size. Determining where 
a model undergoes rapid mixing is of major importance, as it is in this region that the application 
of the dynamics is physically feasible. 

2.1. Coupling Method. The application of coupling (and path coupling) to mixing times of 
Markov chains begins with the following lemma: 

Lemma 2.2. Let p and v he two probability distributions on Ll. Then 

ll/i — u||tv = inf{P{X / y} : (X, Y) is a coupling of y and u} 

This lemma implies that the total variation distance to stationarity, and thus the mixing time, 
of a Markov chain can be bounded above by the probability P{Xt Yt) for a coupling of the 
Markov chain {Xt, Yt) starting in different configurations; i.e. {Xq, Yq) = (cr, r), or if one of the 
coupled chains is distributed by the stationary distribution vr for all t. 

We run the coupling of the Markov chain, not necessarily independently, until they meet at 
time Tc- This is called the coupling time. After Tc, we run the chains together. We see that Xt 
must have the stationary distribution for t > Tc, since Xt = Yt after coupling. 

Theorem 2.3 (The Coupling Inequality). Let {Xt,Yt) be a coupling of a Markov chain where 
Yt is distributed by the stationary distribution vr. The coupling time of the Markov chain is 
defined by 

Tc := min{t : Xt = Yt}. 

Then, for all initial states x, 

\\P\x, •) - ttIItv < P{Xt fiYt) = P [tc > t] 
and thus Tniix(e) < E[rc/e]. 
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From the Coupling Inequality, it is clear that in order to use the coupling method to bound 
the mixing time of a Markov chain, one needs to bound the coupling time for a coupling of the 
Markov chain starting in all pairs of initial states. The advantage of the path coupling method 
described in the next section is that it only requires a bound on couplings starting in certain 
pairs of initial states. 


2.2. Path Coupling. The idea of the path coupling method is to view a coupling that starts 
in configurations a and r as a sequence of couplings that start in neighboring configurations 
such that (a = xo,xi,X 2 , ■ ■. ,Xr = t). Then the contraction of the original coupling 
distance can be obtained by proving contraction between neighboring configurations which is 
often easier to show. 

Let II be a finite sample space, and suppose {Xt,Yt) is a coupling of a Markov chain on II. 
Suppose also there is a neighborhood structure on H, and suppose it is transitive in the following 
sense; for any x and y, there is a neighbor-to-neighbor path 

X ~ Xi ~ X 2 ~ . . . ~ Xr-l ~ y, 

where n ~ n denotes that sites u and v are neighbors. 

Let d{x, y) be a metric over H such that d{x, y) > 1 for any x ^ y, and 

r 

d{x,y) = min (i(xi_i,Xj), 

p-.x^y ^ 

2=1 

where the minimum is taken over all neighbor-to-neighbor paths 

p : Xo = X ~ Xi ~ X2 ~ . . . ~ Xr-l ^ Xr = y 

of any number of steps r. Such metric is called path metric. Next, we define the diameter of the 
sample space: 

diam(II) = max d{x,y). 

x,y£Q 

Finally, the coupling construction allows us to define the transportation metric of Kantorovich 
(1942) as follows: 

dif (x, y) := E[d{Xt+i,Yt+i) \Xt = x,Yt = y]. 

One can check d^ix^y) is a metric over II. 

Path coupling, invented by Bubley and Dyer in 1997, is a method that employs an existing 
coupling construction in order to bound the mixing time from above. This method in its standard 
form usually requires certain metric contraction between neighbor sites. Specifically, we require 
that for any x 

(2) dK{x,y) = E[d{Xt+i,Yt+i) \Xt = x,Yt = y] < (l - (I(II))d(x, y), 

where 0 < <5(11) < 1 does not depend on x and y. 

The above contraction inequality Q has the following implication. 
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Proposition 2.4. Suppose inequality Q is satisfied. Then 

log diam(O) — log e 


imix(e) < 


6{n) 


Proof. For any x and y in consider the path metric minimizing path 

P \ Xq = X ~ X1 ~ X2 ~ ~ Xr—l ~ Xr = y 

such that 

r 

d{x,y) = 

Then 


2 = 1 


E[d{Xt+i,Yt+i) \Xt = x,Yt = y] = dxix, y) 

r 

- ^,dK{Xi-l,Xi) 


2 = 1 


< 


(1 - ( 5 ( 0 )) '^d{xi-i,Xi) 


2=1 


= (1 - <5(0))d(x,y). 

Hence, after t iterations, 

E[d{XuYt)] < (1 - 6{n)ydiXo,Yo) < (1 - 5(0))'diam(0) 
for any initial (Xo,yo)) and 


PiXt / Yt) = P{diXt,Yt) > 1) < E[diXt,Yt)] < (1 - 5(0))'diam(0) < 


whenever 


t > 


logdiam(O) — loge 
-log (l - (5(0)) 


Thus, by the Coupling Inequality, 
imix(e) < 


log diam(O) — log e 

< 

logdiam(O) — loge 

- log (1-(5(0)) 


(5(0) 


□ 

„d 


Example. Consider the Ising model over a (i-dimensional torus Z'^/nZ'^. There 0 = {—1, +1} 
is the space of all spin configurations, and for any pair of configurations (T, r G 0, the path metric 
d{a, t) is the number of discrepancies between them 

d{a,T)= ^ l{axj^Tx}. 


The diameter diam(O) = It can be checked that if the inverse temperature parameter fd 
satisfies tanh(/3) < the contraction inequality Q is satisfied with 

1 — 2dtanh(,0) 


5{n) = 


n 
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Hence 


where C 




log diam(H) — log e 

6{n) 


d 

1—2cZtanh(/5) * 


d log n — log e 

Tl - 

1 — 2dtanh(/3) 


O 


(^Cn log 


n 


The emergence of the path coupling technique [5] has allowed for a greater simplification in 
the use of the coupling argument, as rigorous analysis of coupling can be significantly easier 
when one considers only neighboring configurations. However, the simplification of the path 
coupling technique comes at the cost of the strong assumption that the coupling distance for all 
pairs of neighboring configurations must be contracting. Observe that although the contraction 
between all neighbors is a sufficient condition for the above mixing time bound, it is far from 
being a necessary condition. In fact, this condition is an artifact of the method. 

There had been some successful generalizations of the path coupling method. Specifically in 
m, m and [1] . In m, the path coupling method is generalized to account for contraction 
after a specific number of time-steps, defined as a random variable. In [23] a multi-step non- 
Markovian coupling construction is considered that evolves via partial couplings of variable 
lengths determined by stopping times. In order to bound the coupling time, the authors of 
|24| introduce a technique they call variable length path coupling that further generalizes the 
approach in m- 


2.3. Random-to-Random Shuffling. An example illustrating the idea of path coupling can 
be found in the REU project [32] of Jennifer Thompson that was supervised by Yevgeniy 
Kovchegov in the summer of 2010 at Oregon State University. There, we consider the shuf¬ 
fling algorithm whereby on each iteration we select a card uniformly from the deck, remove 
it from the deck, and place it in one of the n positions in the deck, selected uniformly and 
independently. Each iteration being done independently of the others. This is referred to as 
the random-to-random card shuffling algorithm. We need to shuffle the deck so that when we 
are done with shuffling the deck each of n! possible permutations is obtained with probability 
close to Its mixing time can be easily shown to be of order 0(n logn) using the notion of 
strong stationary time. Eor this one would consider the time it takes for each card in the deck to 
be selected at least once. Then use the coupon collector problem to prove the 0(n log n) upper 
bound on the mixing time. The same coupon collector problem is applied to show that we need 
at least 0(n log n) iterations of the shuffling algorithm to mix the deck. The goal of the REU 
project in [32] was to arrive with the 0(n log n) upper bound using the coupling method. 

2.3.1. The Coupling. Take two decks of n cards, A and B. 

• Randomly choose i G 

• Remove card with label i from each deck. 

• Randomly reinsert card i in deck A. 

• (1) If the new location of i in A is the top of A, then insert i on the top of B. 

(2) If the new location of i in A is below card j, insert i below j in B. 

Let At G Sn and Bt G Sn denote the card orderings (permutations) in decks A and B after t 
iterations. 
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A B 

2 4 

4 3 

1 1 

3 2 

Figure 1. One configuration of matchings between two decks of n = 4 cards. 

2.3.2. Computing the coupling time with a laces approach. We introduce the following path 
metric d(-, •) : Sn x Sn ^ by letting d{a,a') be the minimal number of nearest neighbour 
transpositions to traverse between the two permutations, a and a'. For example, for the two 
decks A and B in Figure a distance minimizing path connecting the two permutations is given 
in Figure 


A B 


2 4-4-4-4 

X 

4 2-2 3 -3 



Figure 2. Minimal number of crossings between the two permutations is four. 

Note that d(iT, a') < ( 2 ). We consider the quantity dt = d{At, Bt), the distance between our two 
decks at time t. We want to find the relationship between and 

We consider a d(-, •)-metric minimizing path. We call the path taken by a card label a lace. 
Thus each lace representing a card label is involved in a certain number of crossings. Let rt be 
the number crossings per lace, averaged over all n card labels. Then we have dt = 

The evolution of the path connecting At to Bt can be described as following. At each timestep 
we pick a lace (corresponding to a card label, say i) at random and remove it. For example, 
take a minimal path connecting decks A and B in Figure and remove a lace corresponding to 
label 3, obtaining Figure Then we reinsert the removed lace back. There will be two cases: 

(1) With probability ^ we place the lace corresponding to card label i to the top of the deck. 
See Figure]^ Then there will be no new crossings. 
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2 4-4 

X 

4 2 1 



Figure 3. Removing lace 3 decreases the number of crossings to two. 

3-3-3 


2 4-4 

X 

4 2 1 



Figure 4. Placing lace 3 on top does not add new crossings. 


(2) We choose a lace j randomly and uniformly chosen among the remaining n — 1 laces, 
and place lace i directly below lace j. This has probability Then the number of 

additional new crossings is the same as the number of crossings of lace j, as in Figure 
Here 


E[new crossings] 


E [average number of crossings for the remaining laces] 



1 


n — 1 


2 

3 

4 
1 





4 


1 



2 




Figure 5. Inserting lace 3 directly below lace 2 adds the same number of crossing 
as there were of lace 2. 


Then 


Hence 


lE[df+ilX, Bt] 





df 
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and therefore 


P{At / Bt) = P{dt > 1) < E[dt] 




E[do] < 



whenever 


t > 


—2 log n + log 2 + log e 


2nlogn + 0{n). 



< e 


Thus providing an upper bound on mixing time. 


3. Gibbs Ensembles and Glauber Dynamics 

In recent years, mixing times of dynamics of statistical mechanical models have been the 
focus of much probability research, drawing interest from researchers in mathematics, physics 
and computer science. The topic is both physically relevant and mathematically rich. But 
up to now, most of the attention has focused on particular models including rigorous results 
for several mean-field models. A few examples are (a) the Curie-Weiss (mean-field Ising) model 
[m El Eni ES] , (b) the mean-field Blume-Capel model [ISIEG], (c) the Curie-Weiss-Potts (mean- 
field Potts) model [HE]- A good survey of the topic of mixing times of statistical mechanical 
models can be found in the recent paper by Cuff et. al. [H]- 

The aggregate path coupling method was developed in [2S] and m to obtain rapid mixing 
results for statistical mechanical models, in particnlar, those models that undergo a first-order, 
discontinuous phase transition. For this class of models, the standard path coupling method 
fails to be applicable. The remainder of this survey is devoted to the exposition of the aggregate 
path coupling method applied to statistical mechanical models. 

As stated in HZ], “In statistical mechanics, one derives macroscopic properties of a substance 
from a probability distribution that describes the complicated interactions among the individual 
constituent particles.” The distribution referred to in this quote is called the Gibbs ensemble or 
Gibbs measure which are defined next. 

A configuration of the model has the form to = (u}i,uj 2 , ■ ■ ■, tOn) £ A”, where A is some finite, 
discrete set. We will consider a configuration on a graph with n vertices and let Xfiuo) = oji 
denote the spin at vertex i. The random variables W’s for i = 1,2,... ,n are independent and 
identically distributed with common distribution p. The interactions among the spins are defined 
through the Hamiltonian function Hn and we denote by Mn{oj) the relevant macroscopic quantity 
corresponding to the confignration w. The lift from the microscopic level of the configurations to 
the macroscopic level of Mn is through the interaction representation function H that satisfies 

(3) Hniuj) = nH{Mn{uj)). 

Definition 3.1. The Gibbs measure or Gibbs ensemble in statistical mechanics is defined 
as 

(4) Pn,p (B) = \ [ ejcp{-fiHn{uj)}dPn= \ [ exp{-PnH {Mniuj))}dPn 

^n\P) Jb ■^n\P) Jb 
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where Pn is the product measure with identical marginals p and Zn{(3) = exp {— dPn 
is the partition function. The positive parameter (3 represents the inverse temperature of the 
external heat bath. 

Definition 3.2. On the configuration space A”, we define the Glauber dynamics for the class 
of spin models considered in this paper. These dynamics yield a reversible Markov chain with 
stationary distribution being the Gibbs ensemble Pn,p- 

(i) Select a vertex i from the underlying graph uniformly, 

(ii) Update the spin at vertex i according to the distribution Pn,i3, conditioned on the event 
that the spins at all vertices not equal to i remain unchanged. 

For more on Glauber dynamics, see [6]. 

An important question of mixing times of dynamics of statistical mechanical models is its 
relationship with the thermodynamic phase transition structure of the system. More specifically, 
as a system undergoes an equilibrium phase transition with respect to some parameter; e.g. 
temperature, how do the corresponding mixing times behave? The answer to this question 
depends on the type of thermodynamic phase transition exhibited by the model. In the next 
section we define the two types of thermodynamic phase transition via the large deviation 
principle of the macroscopic quantity. 

4. Large Deviations and Equilibrium Macrostate Phase Transitions 

The application of the aggregate path coupling method to prove rapid mixing takes advantage 
of large deviations estimates that these models satisfy. In this section, we give a brief summary 
of large deviations theory used in this paper, written in the context of Gibbs ensembles defined 
in the previous section. For a more complete theory of large deviations see for example [l2j and 

m- 

A function I on M”? is called a rate function if I maps to [0, oo] and has compact level 
sets. 

Definition 4.1. Let Ip be a rate function on M'^. The sequence {Mn} with respect to the Gibbs 
ensemble Pn,p is said to satisfy the large deviation principle (LDP) on with rate function 
Ip if the following two conditions hold. 

For any closed subset F, 

(5) limsup - log Pn,p{Mn G E} < -Ip{F) 

71^00 

and for any open subset G, 

(6) liminf - logPn,p{Mn G G} > -Ip{G) 

n^oo 77, 

where Ip{A) = 

The LDP upper bound in the above definition implies that values z satisfying Ip{z) > 0 have 
an exponentially small probability of being observed as n —)• oo. Hence we define the set of 
equilibrium macrostates of the system by 

8p = {z\ Ip{z) = 0 }. 
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For the class of Gibbs ensembles studied in this survey paper, the set of equilibrium macrostates 
exhibits the following general behavior. There exists a phase transition critical value of the pa¬ 
rameter /3c such that 

(a) for 0 < /3 < /3c, the set Sjs consists of a single equilibrium macrostate (single phase); i.e. 

= {zp} 

(b) for /3c < , 0 , the set consists of a multiple equilibrium macrostates (multiple phase); i.e. 


The transition from the single phase to the multiple phase follows one of two general types. 

Continuous, second-order phase transition: For all j = 1, 2,..., g, lim^^^+ = zp 

Discontinuous, first-order phase transition: For all j = 1, 2,..., ( 7 , lim^^^+ zp^j 7 ^ Zj^ 

As mentioned in the Introduction, understanding the relationship between the mixing times of 
the Glauber dynamics and the equilibrium phase transition structure of the corresponding Gibbs 
ensembles is a major motivation for the work discussed in this paper. 

Recent rigorous results for statistical mechanical models that undergo continuous, second- 
order phase transitions, like the famous Ising model, have been published in [291 ng ng. For 
these models, it has been shown that the mixing times undergo a transition at precisely the ther¬ 
modynamic phase transition point. In order to show rapid mixing in the subcritical parameter 
regime (/3 < /3c) for these models, the classical path coupling method can be applied directly. 

However, for models that exhibit the other type of phase transition: discontinuous, first-order; 
e.g. Potts model with q > 2 jag HO] and the Blume-Capel model isiaiTiiHiigiis] with weak 
interaction, the mixing time transition does not coincide with the thermodynamic equilibrium 
phase transition. 


Discontinuous, first-order phase transitions are more intricate than their counterpart, which 
makes rigorous analysis of these models traditionally more difficult. Furthermore, the more 
complex phase transition structure causes certain parameter regimes of the models to fall outside 
the scope of standard mixing time techniques including the classical path coupling method 
discussed in subsection 2.2 This was the motivation for the development of the aggregate path 
coupling method. 


In the following two sections, we define and characterize the aggregate path coupling method 
to two distinct classes of statistical mechanical spin models. We begin with the mean-field 
Blume-Capel (BC) model, a model ideally suited for the analysis of the relationship between 
the thermodynamic equilibrium behavior and mixing times due to its intricate phase transition 
structure. Specifically, the phase diagram of the BC model includes a curve at which the model 
undergoes a second-order, continuous phase transition, a curve where the model undergoes a 
first-order, discontinuous phase transition, and a tricritical point which separates the two curves. 
Moreover, the BC model clearly illustrates the strength of the aggregate path coupling method 
within the simpler setting where the macroscopic quantity for the model is one dimensional. 

In section [^ we generalize the ideas applied to the BC model and define the aggregate path 
coupling method to a large class of statistical mechanical models with macroscopic quantities 
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in arbitrary dimensions. We end the survey paper with new mixing time results for Glauber 
dynamics that converge to the so-called generalized Potts model on the complete graph j25j by 
applying the general aggregate path coupling method derived in that section . 


5. Mean-field Blume-Capel model 


The Hamiltonian function on the configuration space = { — 1,0,1}”' for the mean-field 
Blume-Capel model is defined by 


1=1 


K 

n 



2 


for configurations oj = (wi,... ,0;^). Here K represents the interaction strength of the model. 
Then for inverse temperature /3, the mean-field Blume-Capel model is defined by the sequence 
of probability measures 

^n,/3.A-(w) = y exp 

where ZniP, K) = XlcjeA" 6 xp[—is the normalizing constant called the partition func¬ 
tion. 


5.1. Equilibrium Phase Structure. In [23], using large deviation theory [T8|, the authors 
proved the phase transition structure of the BC model. The analysis of Pn,i3,K was facilitated 
by expressing it in the form of a Curie-Weiss (mean-field Ising)-type model. This is done by 
absorbing the noninteracting component of the Hamiltonian into the product measure that 
assigns the probability 3“” to each cu G A”, obtaining 


( 7 ) 


P 


n,l3,K 


(doj) = 


Zn{P,K) 


■ exp 




n 


Pn,fi {dt 


w] 


In this formula Sn{oj) equals the total spin ^ 
identical one-dimensional marginals 

( 8 ) ■ 


Pn,i 3 is the product measure on A” with 
e^p{-Pujj) p{dujj), 


Z{I3) is the normalizing constant f^exp{—Puj‘j)p{dujj) = l-\-2e and Zn{fd,K) is the normal¬ 
izing constant [Z(/3)]”/Z„(/3, AT). 


Although Pn,p,K has the form of a Curie-Weiss (mean-field Ising) model when rewritten as in 
Q, it is much more complicated because of the /3-dependent product measure Pn^p and the 
presence of the parameter K. These complications introduce new features to the BC model 
described above that are not present in the Curie-Weiss model m- 


The starting point of the analysis of the phase-transition structure of the BC model is the large 
deviation principle (LDP) satisfied by the spin per site or magnetization Sn/n with respect to 
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Pn,i3,K- III order to state the form of the rate function, we introduce the cumulant generating 
function c/j of the measure pp defined in Q; for t G M this function is defined by 

cpit) = log / exp{tuji) ppiduji) = log 
Ja 

We also introduce the Legendre-Fenchel transform of cp, which is defined for z G [—1,1] by 

Jp{z) = sup{tz - cp{t)} 
tern 

and is finite for z G [—1,1]. Jp is the rate function in Cramer’s theorem, which is the LDP for 
Sn/n with respect to the product measures Pn^p m Thm. II.4.1] and is one of the components 
of the proof of the LDP for Sn/n with respect to the BC model Pn,p,K- This LDP is stated in 
the next theorem and is proved in Theorem 3.3 in [23]. 

Theorem 5.1. For all f] > 0 and K > 0, with respect to Pn,p,K, Sn/n satisfies the large 
deviation principle on [—1,1] with exponential speed n and rate function 

Ip,K{z) = Jp{z) - j3Kz^ - inf {Jp{y) - 

The LDP in the above theorem implies that those z G [—1,1] satisfying Ip^x{z) > 0 have 
an exponentially small probability of being observed as n —)• oo. Hence we define the set of 
equilibrium macrostates by 

= {z e [-1,1] : Ip,K{z) = 0}. 


1 + e P(e* + e *) 
1 + 2e-h 


For z G M we define 

(9) Gp^Kiz) = /3Kz^ - cp{2fiKz) 

and as in |21j and |22| refer to it as the free energy functional of the model. The calculation of 
the zeroes of Ip^K — equivalently, the global minimum points of Jp^xiz) — fiKz'^ — is greatly 
facilitated by the following observations made in Proposition 3.4 in |23] : 

(1) The global minimum points of Jp^xiz) — fiKz'^ coincide with the global minimum points 
of Gp^K: which are much easier to calculate. 

(2) The minimum values min^gKl J^^x(z) — fiKz'^} and min 2 g]R{G/ 3 ^x(' 2 )} coincide. 

Item (1) gives the alternate characterization that 

( 10 ) 4 if = {z G [—1,1] : z minimizes Gp ,k{z)}. 


The free energy functional Gp^x exhibits two distinct behaviors depending on whether fi < 
fic = log 4 or /3 > fic- III the first case, the behavior is similar to the Curie-Weiss (mean-field 
Ising) model. Specifically, there exists a critical value K/ (fi) defined in ( 11 ) such that for 
K < K/ {(3)i Gp^K has a single minimum point at z = 0. At the critical value K = K/ {(3), 
Gp^K develops symmetric non-zero minimum points and a local maximum point at z = 0. This 
behavior corresponds to a continuous, second-order phase transition and is illustrated in Figure 

El 


On the other hand, for fi > fie-, Gp^x undergoes two transitions at the values denoted by Ki{/3) 
and For K < Kiifi), Gp, X again possesses a single minimum point at z = 0 . At the 
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K »K“tB 


Figure 6. The free-energy functional Gp^K for P < Pc 


first critical value iFi(/?), develops symmetric non-zero local minimum points in addition 

to the global minimum point at z = 0. These local minimum points are referred to as metastable 
states and we refer to iFi(/?) as the metastable critical value. This value is defined implicitly in 
Lemma 3.9 of j23| as the unique value of K for which there exists a unique z > 0 such that 

As K increases from Ki{j3) to the local minimum points decrease until at A = 

the local minimum points reach zero and G^^k possesses three global minimum points. There¬ 
fore, for /3 > /3c, the BC model undergoes a phase transition at A = Ac^^(/3), which is defined 
implicitly in [23]. Lastly, for A > Kc^\/3), the symmetric non-zero minimum points drop be¬ 
low zero and thus Gjs^x has two symmetric non-zero global minimum points. This behavior 
corresponds to a discontinuous, hrst-order phase transition and is illustrated in Figure]^ 







Figure 7. The free-energy functional for P > Pc 
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In the next two theorems, the structure of corresponding to the behavior of just 

described is stated which depends on the relationship between j3 and the critical value Pc = 
log 4. We first describe for 0 < /3 < /3c and then for /3 > Pc- lu the first case £p,K 

i‘2') 

undergoes a continuous bifurcation as K increases through the critical value Kc {P) defined in 
0 ; physically, this bifurcation corresponds to a second-order phase transition. The following 
theorem is proved in Theorem 3.6 in [23| . 


Theorem 5.2. For 0 < P < Pc, we define 


( 11 ) 


K^^\P) 


1 

2/3c'^(0) 


e^ + 2 

~W~ 


For these values of P, £p,K has the following structure. 

(a) For 0 < ii: < i^?^(/3), Sp^x = {0}. 

(b) For K > Kc^\p), there exists z{P,K) > 0 such that Sp^x = {^z{P, K)}. 

(c) z{P,K) is a positive, increasing, continuous function for K > Kc {P), and as K ^ 

/q\ ~ f 

{Kc (P))^, z{P,K) —)■ 0. Therefore, £p,x exhibits a continuous bifurcation at Kc (P). 


For P G (0, Pc), the curve (/3, Kc {P)) is the curve of second-order critical points. As we will see 
in a moment, for /3 G {Pc, oo) the BC model also has a curve of first-order critical points, which 
we denote by {P,K^\p)). 


We now describe £p^x for P > Pc- In this case £p^x undergoes a discontinuous bifurcation as K 
increases through an implicitly defined critical value. Physically, this bifurcation corresponds to 
a first-order phase transition. The following theorem is proved in Theorem 3.8 in [23j . 

Theorem 5.3. For all P > Pc, £p,x has the following structure in terms of the quantity K^\p) 
defined implicitly for P > Pc on page 2231 of [23] . 

(a) ForO<K < Ki^\p), £p^x = {0}. 

(b) There exists z{P,K^\p)) > 0 such that £ T^{i),a, = {0,±z(/3, Ari^^(/3))}. 

(c) For K > Kc^\p) there exists z{P,K) > 0 such that £p^x = {=t-z(/3, A)}. 

(d) z{P,K) is a positive, increasing, continuous function for K > K^\p), and as K ^ 
Kc^\p)~^, z{P,K) —)■ z{P,K^\p)) > 0. Therefore, £p,x exhibits a discontinuous bifurcation at 
K^c\P). 


implies the following weak convergence result used in the proof of rapid mixing in the first-order, 
discontinuous phase transition region. It is part (a) of Theorem 6.5 in [23]. 

Theorem 5.4. For p and K for which £p^x = {0}, 

Pn,p,K{Sn/'n G dx} do as n ^ oo. 


The phase diagram of the BC model is depicted in Figure® The LDP stated in Theorem 5.1 


We end this section with a final result that was not included in the original paper |23j but will 
be used in the proof of the slow mixing result for the BC model. The result states that not only 
do the global minimum point of Gp^x a-ud Ip^x coincide, but so do the local minimum points. 
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K 



Figure 8. Equilibrium phase transition structure of the mean-field Blume-Capel model 

Lemma 5.5. In the case where o,nd strictly convex at their minimum points, a 

point z is a local minimum point ofG/^^K if and only if it is a local minimum point of I 

Proof. Assume that z is a local minimum point of Gp^x- Then i is a critical point of Gfs^x which 
implies that z = c'^{2fIKz). By the theory of Legendre-Fenchel transforms, = {dp)~^{z) 

and thus 

= Jpi'z) - ‘^f3Kz = {c'^)-\z) - 2l3Kz = 0. 

Next, since z is a local minimum point of Gj^^x, 

G'p^xi^) > 0 Cp{2l3Kz) < 

Therefore, 

and we conclude that z is a local minimum point of Ip,x- The other direction is obtained by 
reversing the argument. □ 


5.2. Glauber Dynamics. The Glauber dynamics, defined in general in sectionj^ for the mean- 
field Blume-Capel model evolve by selecting a vertex i at random and updating the spin at i 
according to the distribution Pn,i 3 ,K, conditioned to agree with the spins at all vertices not equal 
to i. If the current configuration is to and vertex i is selected, then the chance of the spin at i 
is updated to -|-1 is equal to 

g2/3R'5(a;,j)/n 

fi20KS(uj,i)/n _j_ Qp—(l3K)/n _j_ f^—2fiKS(u, 1 )/n 


(12) 
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where S{u},i) = is the total spin of the neighboring vertices of i. Similarly, the proba¬ 

bilities of i updating to 0 and —1 are 


(13) 

and 


Po{uj,i) 


fi2l3KS(uj,i)/n _j_ ^l3—(l3K)/n _j_ ^—2/3KS{u>,i)fn 


(14) 




^-2^KS(oj,i}ln 

g2pKS{w,i)/n _j_ Ql3—(f}K)/n _j_ g—2/3A'5(a;,j)/n 


p+i{uj, i) is increasing with respect to S{uj, i), p-i{u}, i) is decreasing with respect to S{uj, i), and 
Po{uj,i) is decreasing for S{uj,i) > 0 and increasing for S{uj,i) < 0. 


A classical tool in proving rapid mixing for Markov chains defined on graphs, including the 
Glauber dynamics of statistical mechanical models, is the path coupling technique discussed in 
subsection 2.2 It will be shown that this technique can be directly applied to the BC model 


in the second-order, continuous phase transition region but fails in a subset of the first-order, 
discontinuous phase transition region. It is for the latter region that we developed the aggregate 
path eoupling method to prove rapid mixing. First, the standard path coupling method for the 
BC model is introduced in the next section. 


5.3. Path Coupling. We begin by setting up the coupling rules for the Glauber dynamics of 
the mean-field Blume-Capel model. Define the path metric p on D” = {—1,0,1}"' by 

n 

(15) p{a,T) = '^\aj-Tj\. 


Remark 5.6. In the original paper |26j on the mixing times of the mean-field Blume-Capel 
model, the ineorrect path metric was used. In that paper, the path metric was defined by 


P(0',t) = T^Tj} 

i=i 


With the correct metric defined in (15), the proofs in [26] remains the same. 


Let a and r be two configurations with p{a, r) = 1; i.e. a and r are neighboring configurations. 
The spins of a and r agree everywhere except at a single vertex i, where either Uj = 0 and Ti 0, 
or ai 0 and r* = 0. Assume that (Tj = 0 and r* = 1. We next describe the path coupling 
{X, Y) of one step of the Glauber dynamics starting in configuration a with one starting in 
configuration r. Pick a vertex k uniformly at random. We use a single random variable as the 
common source of noise to update both chains, so the two chains agree as often as possible. In 
particular, let U he a, uniform random variable on [0, 1] and set 

( —1 if 0<U<p-i{a,k) 

X{k)=< 0 a p-i{a,k) < U < p-i{a,k)-\-po{cr,k) 

I -1-1 if p-i{a,k)-I po{a,k) < U < 1 
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and 

( —1 if 0<U<p-i{T,k) 

Y{k) = < 0 if p-i{T,k) <U <p-i{T,k)+po{T,k) 

[ +1 if p-i{T,k)+po{T,k) <U <1 
Set X{j) = Uj and Y{j) = Tj for j ^ k. 

Since ai < Ti, for all j ^ i, S{a,j) < S(T,j) and thus 

p+i (r, k) > p+i{a, k) and p-i(r, k.) < p-i{a, k) 

The path metric p on the coupling above takes on the following possible values. 

r 0 if k = i 

p{X, y)=< 1 if k ^ i and both chains updates the same 
I 2 if k ^ i and the chains update differently 

Note that since a and r are neighbor configurations, p{X,Y) / 3 because the update probabil¬ 
ities of X and Y are sufficiently close. 

The application of the path coupling technique to prove rapid mixing is dependent on whether 
the mean coupling distance with respect to the path metric p, denoted by "&fj^T[p{X, Y )], contracts 
over all pairs of neighboring configurations. 


In the lemma below and following corollary, we derive a working form for the mean coupling 
distance. 


Lemma 5.7. Let p he the path metric defined in (32) and {X,Y) be the path coupling of one 
step of the Glauber dynamics of the mean-field Blume-Capel model where X and Y start in 
neighboring configurations a and r. Define 

2 sinh(?l^' 


(16) 


Then 


‘fl3,K{x) = 


-X 




n — 1 


+ 


2 cosh(^^x) + 
(n- 1) 




n 


n 


[<^/3,R(S'n(r)) - ipp^K{Sn{(y))] 


Proof. Let n_i,no and n+i denote the number of —1,0 and -|-1 spins, respectively, in con¬ 
figuration cj, not including the spin at vertex i, where the configurations differ. Note that 
n_i -|- no -|- n+i = n — 1. 


Define e(—1) to be the probability that X and Y update differently when the chosen vertex 
k i is a —1 spin. Similarly, define e(0) and e(-|-l). Then the mean coupling distance can be 
expressed as 


e^,Ap{x,y)] 


!hi(i _,(_!)) + !^(1 - ^(0)) + ^(1 - £(+i)) 
n n n 


+2 


^s(-l) + ^e(0) + ^e(+l) 
in n n . 


n — 1 ? 2 _i ^ ^ no . ^ nj^\ , 

-+ —^ -1 + —^ 0 + —5 +1 

n n n n 
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The probability that X and Y update differently when the chosen vertex fc 7 ^ i is a — 1 spin is 
given by 

e(-l) = [P-i(t^) + [(P-i(Tfc) +Po(t^)) - {P-i{T,k) + po{T,k))] 

= [p+i(r, k) - p+i (cr, k)] + \p-i{a, k) - p_i (r, k)] 

= [p+i{T,k) -p-i{T,k)] + [p-i{a,k) -p+i{a,k)] 

_ 2sinh(^(5’n(T) + 1)) _ 2 sinh(^(5^(o-) + 1 )) 

2cosh(^(5n(r) + 1)) + 2 cosh(^(5n(o-) + 1)) + 

= P 0 ,Ki{Sn{T) + 1)) - ipl3,K{{Sn{(T) + 1)) 

= p^,KiSniT))-p/3,K{Sni(T))+o(^-^'^ 

Similarly, we have 

e(0) = ip0^K{Sn{r)) - (pp^K{Sn{cr)) 

and 

e(+l) = ^p,Ki{SniT) - 1)) - p^,K{{Sn{(y) - 1)) = <P0,K{Sn{T)) - <Pl3,K{Sn{(T)) + O 

and the proof is complete. □ 

For Cjs defined in Q, we have 

Pf},K{x) = dp (1 + 0 ( 1 /’^)) 

which yields the following corollary. 


Corollary 5.8. Let p be the path metric defined in (32) and {X,Y) be the path coupling where 
X and Y start in neighboring configurations a and r. Then 




n-l , (n - 1) [ , 

-1-C/C 

n n ^ 






By the above corollary, we conclude that the mean coupling distance of a coupling starting in 
neighboring configurations contracts; i.e. "&a^T-[p[X,Y)] < p{cr,T) = 1, if 


( 2 / 3 K 


Snir) 

n 


-c'p(2fiK 




.S n{(T) 
n 


2fiK 


Snjr) _ Sn{(T) 
n n 


c"p ( 2fiK 


Sn{(T) 

n 


< 


n — 1 


Since cr and r are neighboring configurations and Sn{T) > Sn{cr), this is equivalent to 


(17) 



1 

2 )^ 


Therefore, contraction of the mean coupling distance, and thus rapid mixing, depends on the 
concavity behavior of the function dp. This is also precisely what determines the type of ther¬ 
modynamic equilibrium phase transition (continuous, second-order versus discontinuous, first- 
order) that is exhibited by the mean-field Blume-Capel model. We state the concavity behavior 
of dp in the next theorem which is proved in Theorem 3.5 in [23]. The results of the theorem 
are depicted in Figure]^ 
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P< Pc 


P>Pc 


Figure 9. Behavior of c'p{w) for large and small /3. 


Theorem 5.9. For (3 > f^c = log4 define 

(18) R’c(/ 3) = cosh“^ f — 4e“^ j > 0. 


The following conclusions hold. 

(a) For 0 < /3 < fic, c!p{w) is strictly concave for w > 0. 

(b) For fi > fic, is strictly convex for 0 < w < wfifi) and c'p{w) is strictly concave for 

w > Wcifi). 


By part (a) of the above theorem, for fi < fic, c'g{x) < c^(0) = 1/{2I3K^\(3)). Therefore, 
hy^, the mean coupling distance contracts between all pairs of neighboring states whenever 


By contrast, for (3 > fic, we will show that rapid mixing occurs whenever K < Ki(l3) where Ki(/3) 
is the metastable critical value introduced in Subsection ^ and depicted in Figure However, 


since the supremum sup[_]^ c'^(rE) 


> 


the condition K < Ki{(3) is not sufficient for 


2/3Ri(/3)' 

0 to hold. That is, K < Ki{j3) does not imply the contraction of the mean coupling distance 
between all pairs of neighboring states. However, we prove rapid mixing for all K < Kfifi) in 
Subsection |5.5| by using an extension to the path coupling method that we refer to as aggregate 
path coupling. 


We now prove the mixing times for the mean-field Blume-Capel model, which varies depending 
on the parameter values (/3, K) and their position with respect to the thermodynamic phase 
transition curves. We begin with the case (3 < {3c where the model undergoes a continuous, 
second-order phase transition and K < Kc {(3) which corresponds to the single phase region. 
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5.4. Standard Path Coupling in the Continuous Phase Transition Region. We begin 
by stating the standard path coupling argument used to prove rapid mixing for the mean-field 
Blume-Capel model in the continuous, second-order phase transition region. The result is proved 
in Proposition |2.4[ 


Theorem 5.10. Suppose the state space Q of a Markov chain is the vertex set of a graph with 
path metric p. Suppose that for each edge {a, r} there exists a coupling {X, Y) of the distributions 
P((T, •) and P{t, •) such that 

'Efj^T-[p{X,Y)] < p{a,T)e~°‘ for some a> 0 


Then 


tmix(^) ^ 


— log(e) -t- log(diam(n)) 
a 


5.9 


In this section, we assume fd < (dc which implies that the BC model underg oes a continuous 
second-order phase transition at K = Kc‘^\(d) defined in |h). By Theorem 
c'^(x) is concave for x > 0. See the first graph of Figure ^as re 
the rapid mixing result for the mean-field Blume-Cape 
phase transition regime. 


for /d < jdc, 
erence. We next state and prove 
model in the second-order, continuous 


Theorem 5.11. Let tmixi^) the mixing time for the Glauber dynamics of the mean-field 
Blume-Capel model on n vertices and Kc {/d) the continuous phase transition curve defined in 
(11). Then for fd < fdc = log4 and K < Kc‘^\(d), 


n 


tmix (e) < -(logn-Flog(l/e)) 


a 


for any a G [ 0, ^ ) and n sufficiently large. 

\ Kc id) ) 


Proof. Let {X, Y) be a coupling of the Glauber dynamics of the BC model that begin in neigh¬ 
boring conhgurations a and r with respect to the path metric p defined in (32). By Corollary 


5.8 of Lemma 15.71 


E.,.[p(x,y)] = i-(--^” 


n 


n 


(WK 


Sn{r) 


n 


-c'(2fdK 


Sn{cr) 


n 
J! / 


+ 0(4 


Observe that is an even function and that for /3 < /3c, supc^(x) = c|g(0). Therefore, by the 

X 

mean value theorem and Theorem |5.2t 

[l-(n-l)(2/3iL/n)c'yO)] 




+ 0 ^ 
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for any a G ^0, ” sufficiently large. Thus, for K < 

Theorem |5.10| where the diameter of the configuration space of the 
complete the proof. 


(2i') 

Kc H/3), we can apply 

BC model n” is n, to 
□ 


5.5. Aggregate Path Coupling in the Discontinuous Phase Transition Region. Here 
we consider the region j3 > I3c, where the mean-field Blume-Capel model undergoes a first-order 
discontinuous phase transition. In this region, the function c^(x) which determines whether 
the mean coupling distance contracts (Corollary 5.8) is no longer strictly concave for x > 0 
(Theorem 5.9). See the second graph in Figure]^ for reference. We will show that rapid mixing 
occurs whenever K < Ai(/3) where Ki{f3) is the metastable critical value defined in subsection 
5.1 and depicted in Figure 


As shown in Section 5.3, in order to apply the standard path coupling technique of Theorem 


5.10 


we need the inequality (17) to hold for all values of 5 „((t) and thus sup[_;^^i] c^(x) < 
2 ^^. However since sup[_;^^i] c^(x) > 2 i 3 Ki{i 3 ) ’ condition K < iFi(/3) is not sufficient for 
the contraction of the mean coupling distance between all pairs of neighboring states which is 
required to prove rapid mixing using the standard path coupling technique stated in Theorem 

Eini 


In order to prove rapid mixing in the region where /3 > /3c and K < Ai(/3), we take advantage 
of the result in Theorem 5.4 which states the weak convergence of the magnetization Sn/n to 
a point-mass at the origin. Thus, in the coupling of the dynamics, the magnetization of the 
process that starts at equilibrium will stay mainly near the origin. As a result, for two starting 
configurations a and r, one of which has near-zero magnetization [Sn{(y)/n ~ 0), the mean 
coupling distance of a coupling starting in these configurations will be the aggregate of the 
mean coupling distances between neighboring states along a minimal path connecting the two 
configurations. Although not all pairs of neighbors in the path will contract, we show that in 
the aggregate^ contraction between the two configurations still holds. 


In the next lemma we prove contraction of the mean coupling distance in the aggregate and 
then the rapid mixing result for the mean-field Blume-Capel model is proved in the theorem 
following the lemma by applying the new aggregate path coupling method. 

Lemma 5.12. Let (A, T) he a coupling of one step of the Glauber dynamics of the BC model 
that begin in configurations a and t, not necessarily neighbors with respect to the path metric p 
defined in (32). Suppose j3 > fdc and K < Ai(/3). Then for any a G ^0, exists 

an £ > 0 sum that, asymptotically as n ^ oo, 


(19) 

whenever |S',i(cj)| < en. 


E,,,[p(X,y)] <e-“/Xu,r) 
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Proof. Observe that for (5 > f^c and K < Ki{/3), 

- 2Mm " 

We will show that for a given a' G ^ 2/3 Ai(/ 3) ’ ’ there exists e > 0 such that 

(20) ~ c'l^ixo) < a'{x — xo) whenever |xo| < s 

as c'p{x) is a continuously differentiable increasing odd function and c^(0) = 0. 


In order to show (20), observe that dLiQ) = -— < 

1 l ’ ’ 2I3K)?\i3) 

there exists a <5 > 0 such that 


2/3(/3) ’ since is continuous, 


c|g(x) < a whenever |x| < <5 
The mean value theorem implies that 

c'^(x) — c'^(xo) < oi {x — xq) for all xo,x G {—S, <5) 

Now, let e = a/-|-i/(2 ^x|(ff)) ^ Then for any |xo| < e and |x| > 6, 

I // \ // M ^ k| + ko| ^ (l + e/5)|x| |x-xo| l + e/5 ^ |x-xo| l + el5 . 

Without loss of generality suppose that 5 „(cj) < Snir). Let (cr = xq, xi,..., = r) be a path 

connecting cj to r and monotone increasing in p such that (xj_i,Xi) are neighboring configura¬ 
tions. Here r = p{a,T). Then by Corollary |5.8| of Lemma 5.7 and ( [20) ), we have for |5 „(<t)| < en 
and asymptotically as n —)■ oo, 


^„[p{x,y)\ < Y. 


2=1 


(n — 1) (n — 1) 

^ - ’-pic7,T) + ^ -^ 

n n 




5'n(r) -d 


, /2/3iL 


S/3 


n 


Snicr) 


+ P{(^, t) ■ O 




< 


(n- 1) 


/9(ct,t) + 

1 - 


(n- 1) 


2pKa' 


n 

< 

< e-“/’"p(u,r) 

This completes the proof. 


(Snir) - Snicr)) 
n n 


+ p{(t,t) • O ( ^ 


l-20Ka'^^o(X 


n 




□ 


Theorem 5.13. Let tniix(e) be the mixing time for the Glauber dynamics of the mean-field 
Blume-Capel model on n vertices and LCi(/3) be the metastable critical point. Then, for (3 > fdc 
and K < Ki{f3), 

Tl 

tmix(c) < -(logn + log(2/c)) 
a 

for any a G ^0, ” sufficiently large. 
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dist 


Proof. Let {Xt,Yt) be a coupling of the Glauber dynamics of the BC model such that Yq _ = 

Pn,i3,K^ the stationary distribution. For a given a G ^0, ^ Lemma 

For sufficiently large n, 


5.12 


•) — Pn,l3,K\\TV < 
< 


< 


P{Xt ^ Yt} 

P{p{Xt,Yt) > 1} 

npiXt,Yt)] 

E[E[p(Yt,Yt) \Xt-i,Yt-i]] 

E[E[p(Yt,Yt) \Xt-i,Yt-i] I |5n(lt-l)| < £n] ■ P{\Sn{Yt-l)\ < En} 
+nP{\Sn{Yt-i)\ > en} 


By iterating (19), it follows that 


P^{Xo,-) - Pn,/3,K\\TV < 

< 


e-^/^E[p{Xt_i,Yt.i) I |54yt_i)| < En] • P{|54yt_i)| < en} 
+nP{\Sn{Yt-i)\ > en} 

e-^/^E[p{Xt_,,Yt_i)]+nP{\Sn{Yt-i)\ > en} 


t-i 

< e-<^P^E[p{Xo,Yo)]+nJ2P{\Sn{Ys)\ > en} 

s=0 

= e-^P'^EipiXo, Fo)] + ntPn,p^K{\Sn/n\ > e} 

< + ntPn,p,K{\Sn/n\ > e} 

We recall the result in Theorem |5.4| that for > Pc and K < Ki{P) 

Pn,p,K{Sn/n G dx} => 5o as n —>■ oo. 

Moreover, for any 7 > 1 and n sufficiently large, the LDP stated in Theorem |5 .1 1 implies that 
\\P^{Xo,-) — Pn,p,K\\TV < + ntPn^p^K{\Sn/n\ > e} 

For t = ^(logn + log(2/e)), the above right-hand side converges to e/2 as n —00. □ 


5.6. Slow Mixing. In |26j . the slow mixing region of the parameter space was determined for 
the mean-field Blume-Capel model. Since the method used to prove the slow mixing, called the 
bottleneck ratio or Cheeger constant method, is not a coupling method, we simply state the 
result for completeness. 

Theorem 5.14. Let tmix = tmix(l/4) he the mixing time for the Glauber dynamics of the mean- 

i‘2') 

field Blume-Capel model on n vertices. For (a) P < Pc and K > K} {P), and (b) P > Pc and 
K > Ki{P), there exists a positive constant b and a strictly positive function r{P,K) such that 


f • > hpPhP)n 

^mix _ 
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K 



Figure 10. Mixing times and equilibrium phase transition structure of the mean-field Blume- 
Capel model 


We summarize the mixing time results for the mean-field Blume-Capel model and its relationship 
to the model’s thermodynamic phase transition structure in Figure 10 As shown in the figure, 
in the second-order, continuous phase transition region (/3 < /3c) for the BC model, the mixing 
time transition coincides with the equilibrium phase transition. This is consistent with other 
models that exhibit this type of phase transition. However, in the first-order, discontinuous 
phase transition region (/3 > /3c) the mixing time transition occurs below the equilibrium phase 
transition at the metastable critical value. 


6. Aggregate Path Coupling for General Class of Gibbs Ensembles 

In this section, we extend the aggregate path coupling technique derived in the previous section 
for the Blume-Capel model to a large class of statistical mechanical models that is disjoint from 
the mean-field Blume-Capel model. The aggregate path coupling method presented here extends 
the classical path coupling method for Gibbs ensembles in two directions. First, we consider 
macroscopic quantities in higher dimensions and find a monotone contraction path by considering 
a related variational problem in the continuous space. We also do not require the monotone path 
to be a nearest-neighbor path. In fact, in most situations we consider, a nearest-neighbor path 
will not work for proving contraction. Second, the aggregation of the mean path distance along 
a monotone path is shown to contract for some but not all pairs of configurations. Yet, we use 
measure concentration and large deviation principle to show that showing contraction for pairs 
of configurations, where at least one of them is close enough to the equilibrium, is sufficient for 
establishing rapid mixing. 
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Our main result is general enough to be applied to statistical mechanical models that undergo 
both types of phase transitions and to models whose macroscopic quantity are in higher dimen¬ 
sions. Moreover, despite the generality, the application of our results requires straightforward 
conditions that we illustrate in Section This is a significant simplification for proving rapid 
mixing for statistical mechanical models, especially those that undergo first-order, discontinu¬ 
ous phase transitions. Lastly, our results also provide a link between measure concentration 
of the stationary distribution and rapid mixing of the corresponding dynamics for this class of 
statistical mechanical models. This idea has been previously studied in [3T] where the main 
result showed that rapid mixing implied measure concentration defined in terms of Lipschitz 
functions. In our work, we prove a type of converse where measure concentration, in terms of a 
large deviation principle, implies rapid mixing. 


6.1. Class of Gibbs Ensembles. We begin by defining the general class of statistical mechan¬ 
ical spin models for which our results can be applied. In Section we illustrate the application 
of our main result for the generalized Curie-Weiss-Potts model for which the mixing times has 
not been previously obtained. 

Let be a fixed integer and define A = {e^,e^,... ,e'^}, where are the q standard basis 
vectors of W. A configuration of the model has the form u = (uji,co 2 , ■ ■ ■ ,ujn) G A". We will 
consider a configuration on a graph with n vertices and let Xi{uj) = uji be the spin at vertex i. 
The random variables W’s for i = 1,2, ... ,n are independent and identically distributed with 
common distribution p. 


In terms of the microscopic quantities, the spins at each vertex, the relevant macroscopic 
quantity is the magnetization vector (a.k.a empirical measure or proportion vector) 

(21) — (L^q(w), (^)) ■ ■ •) (w)) 

where the fcth component is defined by 

1 ” 

Ln,k{^) = - 


which yields the proportion of spins in configuration io that take on the value e^. The magneti¬ 
zation vector Ln takes values in the set of probability vectors 


= n 


(22) Vn = \ — : each G {0,1,..., n} and Uk 

I fc=i ) 

inside the continuous simplex 

V = = r'g), each Vk > 0, Vk = l\ . 


k=l 


Remark 6.1. For q = 2, the empirical measure yields the empirical mean Sn{ev)/n where 
Sn{eo) = Therefore, the class of models considered in this paper includes those where 

the relevant macroscopic quantity is the empirical mean, like the Curie-Weiss (mean-field Ising) 
model. 
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As discussed in sectionj^ statistical mechanical models are defined in terms of the Hamiltonian 
function, denoted by Hn{u}), which encodes the interactions of the individual spins and the total 
energy of a configuration. The link between the microscopic interactions to the macroscopic 
quantity, in this case Ln{uj), is the interaction representation function, which we define again 
for convenience below. 

Definition 6.2. For z G we define the interaction representation function, denoted by 
H{z), to be a differentiable function satisfying 

Hnico) = nH{Ln{io)) 

Throughout the paper we suppose the interaction representation function H{z) is a finite concave 
function that has the form 

H{z) = Hfizi) + H 2 {z 2 ) + ... + Hfizg) 

For example, for the Curie-Weiss-Potts (CWP) model |llj . 

H{z) = = -^zj - - ... - ^z^ 

The class Gibbs measures or Gibbs ensemble considered in this section is defined by 

( 23 ) Pn^piB) = \ [ eyip {-fiHn{ui)} dPn = ) . [ ex.p {-fin H {Ln{uj))} dPn 

^n\P) JB ^n\P} JB 

where Pn is the product measure with identical marginals p and Zn{fi) = exp {—fiHn{io)} dPn 
is the partition function. The positive parameter fi represents the inverse temperature of the 
external heat bath. 

Remark 6.3. To simplify the presentation, we take A = {e^, e^,..., e*^}, where e^ are the q 
standard basis vectors ofMP. But our analysis has a straight-forward generalization to the case 
where A = {9^,6^,... ,0'^}, where 9^ is any basis o/M'^. In this case, the product measure Pn 
would have identical one-dimensional marginals equal to 

1 '' 

p = - ^ 60 i 
^ i=l 

An important tool we use to prove rapid mixing of the Glauber dynamics that converge to 
the Gibbs ensemble above is the large deviation principle of the empirical measure with respect 
to the Gibbs ensemble. This measure concentration is precisely what drives the rapid mixing. 
The large deviation principle for our class of Gibbs ensembles Pn^p is presented next. 


6.2. Large Deviations. By Sanov’s Theorem, the empirical measure satisfies the large 
deviation principle (LDP) with respect to the product measure Pn with identical marginals p 
and the rate function is given by the relative entropy 


R{iz\p) = ^Ufclog ( — 
k=l 


© 


for n € V. Theorem 2.4 of 


yields the following result for the Gibbs measures (23). 
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Theorem 6.4. The empirical measure Ln satisfies the LDP with respect to the Gibbs measure 
with rate function 

I/siz) = R{z\p) + fiH{z) - mf{R{t\p) + fiH{t)}. 


As discussed in section the LDP upper bound stated in the previous theorem yields the 
following natural definition of equilibrium macrostates of the model. 

(24) Sp := {u G V : u minimizes R{n\p) + /3Lf(p)} 

For our main result, we assume that there exists a positive interval B such that for all fi G B, 
£p consists of a single state zp. We refer to this interval B as the single phase region. 

Again from the LDP upper bound, when fi lies in the single phase region, we get 


(25) 


Pn,p{Ln £ dx) 6zp as n —>■ oo. 


The above asymptotic behavior will play a key role in obtaining a rapid mixing time rate for 
the Glauber dynamics corresponding to the Gibbs measures (23). 


An important function in our work is the free energy functional defined below. It is defined 
in terms of the interaction representation function H and the logarithmic moment generating 
function of a single spin; specifically, for z G M”? and p equal to the uniform distribution, the 
logarithmic moment generating function of Xi, the spin at vertex 1, is defined by 

(26) r( 2 ;) = log I - ^ expjzfc} 

fc=i 



Definition 6.5. The free energy fnnctional for the Gibbs ensemble Pn,p is defined as 

(27) Gp{z) = f3{-Hri-XH{z)) - T{-(3XH{z)) 

where for a finite, differentiable, convex function F on M*?, F* denotes its Legendre-Fenchel 
transform defined by 

F*{z)= sup {(x, z) — F(a:)} 

X&M.1 


The following lemma yields an alternative formulation of the set of equilibrium macrostates 
of the Gibbs ensemble in terms of the free energy functional. The proof is a straightforward 
generalization of Theorem A.l in [TO]. 


Lemma 6.6. Suppose H is finite, differentiable, and concave. Then 

Inf {R{z\p) + 13H{z)} = inf {G/ 3 (z)} 

zGV zGK^ 

Moreover, zq gV is a minimizer of R{z\p) + f3H{z) if and only if zq is a minimizer of Gp{z). 

Therefore, the set of equilibrium macrostates can be expressed in terms of the free energy 
functional as 


(28) £p = {z G F : z minimizes Gp{z)} 

As mentioned above, we consider only the single phase region of the Gibbs ensemble; i.e. values 
of f3 where Gp{z) has a unique global minimum. For example, for the Gurie-Weiss-Potts model 
[To] , the single phase region are values of j3 such that 0 < /3 < /3c := (2(g — l)/{q — 2)) log(g — 1). 
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At this critical value /3c, the model undergoes a first-order, discontinuous phase transition in 
which the single phase changes to a multiple phase discontinuously. 

As we will show, the geometry of the free energy functional Gjs not only determines the 
equilibrium behavior of the Gibbs ensembles but it also yields the condition for rapid mixing of 
the corresponding Glauber dynamics. 


6.3. Glauber Dynamics. On the configuration space A”’, we define the Glauber dynamics, 
defined in general in subsection 5.2, for the class of Gibbs ensembles defined in (23). 
These dynamics yield a reversible Markov chain A* with stationary distribution being the Gibbs 
ensemble Pn,j3- 

(i) Select a vertex i uniformly, 

(ii) Update the spin at vertex i according to the distribution Pn,/?, conditioned on the event 
that the spins at all vertices not equal to i remain unchanged. 

For a given configuration a = (ui, 1 T 2 ,..., an), denote by Uj gS, the configuration that agrees 
with a at all vertices j ^ i and the spin at the vertex i is e^; i.e. 

*^2, • • • , e , fJj-i-i, ..., an) 

Then if the current configuration is a and vertex i is selected, the probability the spin at i is 
updated to e^, denoted by P{a —)• a^^^k), is equal to 

exp{ -/3nP(L„(crj^gfc))} 


(29) 


^i,ek) = 


ELi “ PnH{Ln{ai^^t))] ■ 


Next, we show that the update probabilities of the Glauber dynamics above can be expressed 
in terms of the derivative of the logarithmic moment generating function of the individual spins 
r defined in (26). The partial derivative of T in the direction of has the form 

^ exp{2;£} 

ELi expjzfc} 

We introduce the following function that plays the key role in our analysis. 


(30) 


gf’f^iz) = [a,r] i-/3VH{z)) = 


exp(-^ [diH]{z)) 

ELiexp(-/3[4P'](2))' 

Denote 

( 31 ) 

Note that g^’^{z) maps the simplex 

P = G : u = (ui,U2,...,Uq),each Ufc > 0, ^ Ufc = l| 

into itself and it can be expressed in terms of the free energy functional defined in (27) by 

VG^(z) = l3[Vi-H)*i-VHiz)) - gPf^iz)] 
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Lemma 6.7. Let P{a —>■ a^^k) be the Glauber dynamics update probabilities given in (29). 
Then, for any /c G {1, 2,..., q}, 

P{a ^ ai^,k) = [4r] ( - fdVH{Lnia)) - ^QH{Ln{a)) + QH{Ln{a)))a,) + O (J^'^ , 

where Q is the following linear operator: 

QF{z) := {dlF{z), dlF{z), dlF{z)) , 
for any F : M”? —>■ M in . 

Proof. Suppose at = e™'. By Taylor’s theorem, for any k ^ m, we have 

)) ~ H (^Ln{o')) + Hm(^Ln^ni{o') 1 /^) Hm(^Ln,m{o')^ 

T hT]f(^Ln^k{,^) T f/^) hLk(^Lji j^(^o')^ 

= H{Ln{a)) + i [dkH{Ln{a)) - 5^F(L„((t))] 


+ [dk^iLnicr)) + d'^H{Ln{cr))] + O • 


Now, if A; = m. 


H{L^{a,^,k)) = HiLnia)) 


1 


= HiL^a)) + - [dkH{Ln{a)) - a^F(L,(a))] 

n 

~*~2^ + dmH{Ln{(y))] . 


This implies that the transition probability (29) has the form 

P{a ^ a,^,k) = [4r] ( - pVH{Ln{a)) - ^QH{Ln{a)) + ^dlH{Ln{a))e^) + O 

as exp{0(^) } = l + 0(^). 

The above Lemma |6.71 can be restated as follows using Taylor expansion. 


□ 


Corollary 6.8. Let P{a —)■ o-jg*,) be the Glauber dynamics update probabilities given in (29). 
Then, for any k G {1, 2,..., q}, 

P{^ <^ip) = 9k’^{Ln{(r)) + ^ipj!f^{Ln{a)) + O , 


where 


T^friz) := -1{QH{z), [Va,r] {-fdVH{z))) + (e^ QH{z))(e^, [VS.T] (-/3VF(z))). 

In the next section, we dehne the specific coupling used to bound the mixing time of the 


Glauber dynamics for the class of Gibbs ensembles defined in subsection 6.1 
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6.4. Coupling of Glauber Dynamics. We begin by defining a metric on the configuration 
space A”. For two configurations a and r in A”’, define 

n 

(32) d(cr,T) = /tj} 

i=i 

which yields the number of vertices at which the two configurations differ. 


Let and be two copies of the Glauber dynamics. Here, we use the standard greedy 
coupling of and H*. At each time step a vertex is selected at random, uniformly from the n 
vertices. Suppose X^ = u, W = r, and the vertex selected is denoted by j. Next, we erase the 
spin at location j in both processes, and replace it with a new one according to the following 
update probabilities. For all £ = 1, 2,..., define 

Pi = P{a ^ and qe = P{t ^ 


Pi = inm{pi, qe} and P = '^P^. 

i=i 

Now, let H be a Bernoulli random variable with probability of success P. If H = 1, we update 
the two chains equally with the following probabilities 

P{X^+^ = e\ I H = 1) = ^ 

for i = 1, 2,..., g. On the other hand, if H = 0, we update the chains differently according to 
the following probabilities 


P{X]+^ = e^, Y:*+^ = e™ I H = 0) 


Pi ^i Qm Pm 
1 - P ' 1 - P 


for all pairs i ^ m. Then the total probability that the two chains update the same is equal to 
P and the total probability that the chains update differently is equal to 1 — P. 

Observe that once X* = Y^, the processes remain matched (coupled) for the rest of the time. 
In the coupling literature, the time 

min{t >0 : X^ = Y^} 


is refered to as the coupling time. 

As discussed in section]^ the mean coupling distance E[d(X*, T*)] is tied to the total variation 
distance via the following inequality: 

(33) \\P\x, •) - P\y, OIItv < P{X^ + < nd{X\ Y^)] 

The above inequality implies that the order of the mean coupling time is an upper bound on 
the order of the mixing time. See [30] and [29] for details on coupling and coupling inequalities. 
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6.5. Mean Coupling Distance. Fix e > 0. Consider two configurations a and r such that 

r) = d, 


where d{a,T) G N is the metric defined in (32) and e < \\Ln{cr) — L„(t) ||i < 2e. 


Let X = {ii,..., irf} be the set of vertices at which the spin values of the two configurations a 
and r disagree. Define K(e^) to be the probability that the coupled processes update differently 
when the chosen vertex j 0 X has spin e^. If the chosen vertex j is such that aj = Tj = e^, then 


by Corollary 6.8 of Lemma 6.7 


K(e') 


(7 —r (7 


- Pir ^ T^,e 


k=l 

<? 


^ I {dk’^iPnia)) + 

k=l 


+ 0 




(34) 




k=l 


+ O { - 1 - 2 

n 


Next, we observe that for any function / : X —?■ M, there exists C > 0 such that 


(35) 


f{z') - f{z) - (z' - z,Vf{z)'^ 


< Ce^ 


for all z, z' £ V satisfying s < \\z' — 2 ;||i < 2 e. 


Therefore for n large enough, there exists C' > 0 such that 

«^(e^) - \'^\{Ln{T) - Ln{a),Vg]^’^{Ln{a))^ 


(36) 


fc=i 


< C'e^. 


The above result holds regardless of the value of £ G {1, 2,..., g}. 


Similarly, when the chosen vertex j G X, the probability of not coupling at j satisfies (36). 


the mean dis- 


We conclude that in terms of Kq-.t := 5 Y7k=i (^Pn{T) — Lnicr),Vg^’^{Ln{cr))^ 
tance between a coupling of the Glauber dynamics starting in a and r with d{a, t) = d after 
one step has the form 


¥.„^r[d{X,Y)] < d - -{I - + - - -K„,r + ce^ 


(37) 


= d- 


n n 

I - ^ fl - + C£" 

n \ d/n 


for a fixed c > 0 and all e small enough. 
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6.6. Aggregate Path Coupling. In the previous section, we derived the form of the mean 
distance between a coupling of the Glauber dynamics starting in two configurations whose 
distance is bounded. We next derive the form of the mean coupling distance of a coupling 
starting in two configurations that are connected by a path of configurations where the distance 
between successive configurations are bounded. 

Definition 6.9. Let a and r be configurations in A"'. We say that a path vr connecting config¬ 
urations a and r denoted by 


TT : a = Xq, Xl, . . . , Xr = T, 


is a monotone path if 

r 

(i) Yj d{xi-i,Xi) = d{a,T) 
i=l 

(ii) for each k = 1,2,... ,q, the kth coordinate of Ln{xi), Ln,k{xi) is monotonic as i increases 
from 0 to r; 


Observe that here the points Xi on the path are not required to be nearest-neighbors. 


A straightforward property of monotone paths is that 

r q 

^ ^ ^ ^ Ln,k{xfi} Ln kiXi—i') — L^^O'^ Lnir ) 

i=l k=l 

Another straightforward observation is that for any given path 

Ln{a) = Zq, Zi,...,Zr = Lnix) 

in Vn, monotone in each coordinate, with \\zi — > 0 for alH G {1, 2,..., r}, there exists 

a monotone path 


TT : a = Xo,Xl, ... ,Xr = T 


such that Ln{xi) = Zi for each i. 


Let -K ■. a = xq,xi, ... ,Xr = T be a monotone path connecting configurations a and r such that 
£ < \\Ln{xi) — Ln{xi-i)\\i < 2e for alH = 1,..., r. Equation (37) implies the following bound 
on the mean distance between a coupling of the Glauber dynamics starting in configurations a 
and r: 
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E,^r[diX,Y)] 

r 


2=1 


< < d{xi-i,Xi) ■ 


2=1 


1 - 

n 


. ( sL (Ln{Xi) - L„{xi-i),Vg^’^{Ln{Xi-i))) +cs'^'^ 
£ ^ _ k=l ' __ 


V 


= d(a,T) 


1 

1 - - 
n 


/ q r 

( E E 

^ _ k=l i=l 


d{xi_i,Xi)/n 
(hniXi) - Ln{Xi-i), V^f’^(L„(Xi_i))^ 


)\ 


cs 


( 38 ) 


< d{a,T) 


1 - 


V 

/ q r 

I E E 

^ _ k=l z=l 


2 d(cr, T)/n 


J\ 


(hniXi) - Ln{Xi-i), V^f’^(L„(Xi_i))^ 


ce 


V 


|L„(cr) - i„(r)||i 


)\ 


as Yl d{xi-i,Xi) = d{a,T). 
2 = 1 


From inequality (38), if there exists monotone paths between all pairs of configurations such 
that there is a uniform bound less than 1 on the ratio 

Efc=l Ei=l (^Ln{xi) — Ln{xi-i), ’^(L„(rEj_i))^ 


ll-hn(ll’) -h^(T)||i 

then the mean coupling distance contracts which yields a bound on the mixing time via coupling 
inequality (33). 

Although the Gibbs measure are distributions of the empirical measure L„ defined on the 
discrete space Vn, proving contraction of the mean coupling distance is often facilitated by 
working in the continuous space, namely the simplex V. We begin our discussion of aggregate 
path coupling by defining distances along paths in V. 


Recall the function defined in (31) which is dependent on the Hamiltonian of the model 


through the interaction representation function H defined in Definition 6.2 


Definition 6.10. Define the aggregate (^-variation between a pair of points x and z in V 
along a continuous monotone (in each coordinate) path p to be 

<? 


D^p{x,z) ■='^ [ (^9k’^iy)^dy 
k=i { 


Define the corresponding pseudo-distance between a pair of points points x and z in V as 

dg{x,z) := ini D^Jx,z), 


where the infimum is taken over all continuous monotone paths in V connecting x and z. 
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Notice if the monotonicity restriction is removed, the above infimum would satisfy the triangle 
inequality. We will need the following condition. 

Condition 6.11. Let ZjS he the unique equilibrium macrostate. There exists 6 G (0,1) such that 

lk-^/5||i - 

for all z in V. 


Observe that if it is shown that 
above condition is equivalent to 


dg{z,Zj 3 ) < \\z — for all z in V, then by continuity the 


dg{z,zp) 

lim sup 7 ^-^ < 1 


>2/3 


- 


Suppose Condition 6.11 is satisfied. Then let denote by NG^ the family of smooth curves, 
monotone in each coordinate such that for each z ^ z^ in P, there is exactly one curve p = Pz 
in the family NG^ connecting zp to z., and 


Dp{z,zp) 

Ik - zpWi 


< 1 - 6 / 2 . 


Such family of smooth curves will be referred to as neo-geodesic. 


Condition 6.12. For e > 0 small enough, there exists a neo-geodesic family NG 5 such that for 
each z in V satisfying ||z — 2 ;_a||i > e , the curve p = pz in the family NG ,5 that connects zp to 
z satisfies 

ELiEL. 

-T-N- - < 1 - «/3 

Ik - ZpWl 

for a sequence of points zq = zp, zi,..., Zr = z interpolating p such that 

e < Ik* - ^^i-i||i < 2e for i = 1 , 2 ,..., r. 


It is important to observe that Condition 6.11 is often simpler to verify than Condition 


Moreover, under certain simple additional prerequisites, Condition 6.11 implies Condition 


6.12 


6.12 


For example, this is achieved if there is a uniform bound on the Cauchy curvature at every 
point of every curve in NG, 5 . So it will be demonstrated on the example of the generalized 
Curie-Weiss-Potts model in section that the natural way for establishing Condition 6.12 for 


the model is via first establishing Condition 6.11 


In addition to Condition 6.12| that will be shown to imply contraction when one of the two 
configurations in the coupled processes is at the equilibrium, i.e. L„(ct) = zp , we need a 
condition that will imply contraction between two configurations within a neighborhood of the 
equilibrium configuration. We state this assumption next. 

Condition 6.13. Let zp be the unique equilibrium macrostate. Then, 

\\g^'^{z) - g^’^izp 


lim sup 

z^zp 


- zph 


< 1. 
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Since H{z) G C^, the above Condition 6.13 implies that for any e > 0 sufficiently small, there 
exists 7 G (0,1) such that 


z — re 1 


<1 — 7 


for all z and w in V satisfying 


\z — zbWi < e and \\w — zr\\i<£. 


6.7. Main Result. As discussed in section a sufficient condition for rapid mixing of the 
Glauber dynamics of Gibbs ensembles is contraction of the mean coupling distance y)] 

between coupled processes starting in all pairs of configurations in The classical path cou¬ 


pling argument stated in Proposition 2.4 is a method of obtaining this contraction by only 
proving contraction between couplings starting in neighboring configurations. However for some 
classes of models (e.g. models that undergo a first-order, discontinuous phase transition) there 
are situations when Glauber dynamics exhibits rapid mixing, but coupled processes do not 
exhibit contraction between some neighboring configurations. A major strength of the aggre¬ 
gate path coupling method is that it yields a proof for rapid mixing even in those cases 
when contraction of the mean distance between couplings starting in all pairs of neighboring 
configurations does not hold. 


The strategy is to take advantage of the large deviations estimates discussed in section 6^ 
Recall from that section that we assume that the set of equilibrium macrostates which can 
be expressed in the form given in (28), consists of a single point zg. Define an equilibrium 


configuration to be a configuration such that 


Lnicrg) =Zp = {{zp)l, {zp)2, . . . , {Zfi)q). 


First we observe that in order to use the coupling inequality (33) we need to show contraction 
of the mean coupling distance Ku^r[d{X,Y)] between a Markov chain initially distributed ac¬ 
cording to the stationary probability distribution Pn,i3 and a Markov chain starting at any given 
configuration. Using large deviations we know that with high probability the former process 
starts near the equilibrium and stays near the equilibrium for long duration of time. 


Our main result Theorem 6.15 states that once we establish contraction of the mean coupling 
distance between two copies of a Markov chain where one of the coupled dynamics starts near an 


equilibrium configuration in Lemma 6.14 then this contraction, along with the large deviations 
estimates of the empirical measure L„, yields rapid mixing of the Glauber dynamics converging 
to the Gibbs measure. 

Now, the classical path coupling relies on showing contraction along any monotone path 
connecting two configurations, in one time step. Here we observe that we only need to show 
contraction along one monotone path connecting two configurations in order to have the mean 
coupling distance Ku^r[d{X, y)] contract in a single time step. However, finding even one mono¬ 
tone path with which we can show contraction in the equation (38) is not easy. The answer to 
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Figure 11. Case q = 3. Dashed curve is the continuous monotone path p. Solid 
lines represent the path Ln{xo), L„(xi),..., Ln{xr) in Vn- 


this is in finding a monotone path pinV connecting the values of the two configurations, a 
and T, such that 

if 

- < 1 

||L„(cj) - Ln{T)\\i 

Although p is a continuous path in continuous space V, it serves as Ariadne’s thread for finding 
a monotone path 

TT : a = Xq, Xl,...,Xr=T 

such that Ln{xo), L„(xi),..., Ln{xr) in Vn are positioned along p, and 

q r 


- Ln{Xi-i),Vg^'^{Ln{Xi-i)) 

k=l1=1 

is a Riemann sum approximating f (Vg^'^{y),dy 

k=l p ' 

Yj Yj (Vnixi) - Ln{xi-l),Vg^’^{Ln{xi-l))) 
k=li=l ' ' 


|L„(cr) - Ln{T)\\i 


. Therefore we obtain 


< 1, 


that in turn implies contraction in (38) for e small enough and n large enough. See Figure |11[ 
Observe that in order for the above argument to work, we need to spread points Ln{xi) G Vn 
along a continuous path p at intervals of fixed order e. Thus vr has to be not a nearest- 
neighbor path in the space of configurations, another significant deviation from the classical 
path coupling. 
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Lemma 6.14. Assume Condition 6.12 and Condition 6.13. Let {X,Y) be a coupling of the 
Glauber dynamics as defined in Section 6 . 4 . starting in configurations a and r and let Zjs be the 


single equilibrium macrostate of the corresponding Gibbs ensemble. Then there exists an a > 0 
and an e' > 0 small enough such that for n large enough, 

^„^r[d{X,Y)] < e-^l'^d{a,T) 

whenever \\Ln{cr) — zpWi < e'. 


Proof. Let e and 5 be as in Condition 6.12 and let e' = e^5/M with a constant M ;§> 0. 

Case I. Suppose Lnir) = z and Ln{cr) = w, where \\z — Zf^Wi > e and ||t(; — zpWi < s'. 

Then there is an equlibrium configuration cTp with Ln{ap) = zp such that there is a monotone 
path 


Tx' •. ai3 = Xq, x'l, ..., = r 


Condition 6.12 


connecting configurations ap and r on A"’ such that e < ||L„(x') — L„(x'_;^)||i < 2e, and by 

k=li=l ' ' 




<1-5/4 


for n large enough. Note that the difference between the above inequality and Condition 6.12 
is that here we take Ln{xh) G Vn- 

Now, there exists a monotone path with from ct to r 


,Xr=T 


TT • {T — ^0 ? ^ l?***?*^?^ 

such that 

||L„(xi) - L„(x')||i < e' for alH = 0 , 1 ,..., r. 
The new monotone path vr is constructed from vr' by insuring that either 

0 < (^LniXi) - Ln(Xi-i),e’"^ < {Ln{x'fij - Ln{x[_i), C^^ 


or 


(Ln{x'f) - < (Ln{xi) - Ln{xi-i), e^^ < 0 

for i = 2,..., r and each coordinate A; G {1, 2,..., ( 7 }. 


Then 

q r 


E E I (Lfixi) - Lfixi-i), Vgfi’^{Ln{xi-i))) I XI X I - LfixYi), Y{Ln{xi_i))) 

ig —1 i = ± I ' ' ' fc = l i = l ' ' 


||L„(cr) - I/„(r)||i 


WLfiag) - L„{t)\\i 


< C"re je 


for a fixed constant C" > 0. Noticing that re'/e < 5/M as r < 1/e, and taking M large 
enough, we obtain 

q r 


E E {Ln{Xi) - Ln{Xi-i),Xg]^'^{Ln{Xi-i))) 
k=li=l ' ' 


\Ln(,x7) Z/n(T)||i 


< 1 - 5/4. 


Thus equation (38) will imply 
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Ea.r[d{X,Y)\ < d(a,T) 

< d{a,T) 
= d{a,T) 


1 - 


1 

/ q r 

f E E 

^ k—1 4—1 


- Lnixi^i), Vgf’^(L„(a:j_i))^ 

+ ce^ ^ 


n 

1 


||L„((t) -L„(r)||i 




1--(1-(1-(5/4)-5/20) 


1 - -5/5 


as 


||L„((.)-L(r)||i < ce < 5/20 for e small enough. 


Case II. Suppose L^ir) = z and Ln{cr) = w, where \\z — zp\\i < e and \\w — Zj^\\i < e'. 


Similarly to (37), equation (34) implies for n large enough, 


K[d{X,Y)] < d{a,T) 

< d{a, t) 

< d{a, t) 


1 -- 1 - 

n \ 


\Ln{(j) - Ln{T)\\l 


1-1 
ni 

I-H 

2n. 


+ 0 ^ 




+ 0 




by Condition 6.13 (see also discussion following Condition 6.13) 


□ 


We now state and prove the main theorem of the paper that yields sufficient conditions for 
rapid mixing of the Glauber dynamics of the class of statistical mechanical models discussed. 


Theorem 6.15. Suppose H{z) and (5 > 0 are such that Condition 6.12 and Condition 6.13 are 
satisfied. Then the mixing time of the Glauber dynamics satisfies 


imix = O(relogn) 


Proof. Let e' > 0 and a > 0 be as in Lemma 6.14 Let be a coupling of the Glauber 


dynamics such that X^ Pn,0, the stationary distribution. Then, for sufficiently large n. 


P*(y°,-)-^n,/3||TV 

< p{x* y*} 

= P{d{X\Y^)>l} 

< E[d{X\Y^)] 

= E[E[d(Xhy) |X*-LY‘-1]] 

< E[E[d{x\Y^) |a:*-i,y‘-i] I \\LniX^~^) - ZfiWi < e'] ■ P{\\Ln{X^~^) - zpWi < e'} 
+nP{\\Ln{X^-^)-zfi\i>e'}. 
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By Lemma |6.14[ we have 

E[EKY*,y*) I ||L„(X*"^) - zp\\i < e'] 

(39) < I \\Ln{X^-^) - zp\\i < e'] 


By iterating (39), it follows that 
||P*(y°, •) - Pn,/3,R||TV 


< e-“/”E[d(X*-S y'-i) I ||Ln(X*-i) -zp\\i< e'] • P{||L„(Xt_i) - zp\\i < e'] 

+nP{\\Ln{X^-^)-zp\\i>e'} 

< e-“/”E[d(X*-\y'-i)]+nP{||L4y*-i)-z^||i >e'} 


t-i 


< 


e-“*/"E[(i(X°,yO)] +n J]P{||L„(X^) - z^||i > e'} 


s=0 


= e-“*/"E[(i(X°,yO)] +ntP„,/3{||Ln(X0) - z^||i > e'} 
< + ntPn,p{\\Ln{X^) - zp\\i> e']. 


We recall the LDP limit (25) for (3 in the single phase region B, 

Pn,i3{Ln{X^) G dx] 5zp as n ->cx). 

Moreover, for any 7 ' > 1 and n sufficiently large, by the LDP upper bound, we have 

||P*(y0,-)-^’n,/3||Tv < + >e'} 

For t = ^(logn + log(2/e')), the above right-hand side converges to e'/2 as n —)> 00 . 


□ 


7. Aggregate Path Coupling applied to the Generalized Potts Model 


by 

applying it to the generalized Curie-Weiss-Potts model (GCWP), studied recently in [25]. The 
classical Curie-Weiss-Potts (CWP) model, which is the mean-field version of the well known 
Potts model of statistical mechanics |33| is a particular case of the GCWP model with r = 2. 
While the mixing times for the CWP model has been studied in HHETj, these are the first 
results for the mixing times of the GCWP model. 

Let g be a fixed integer and define A = {e^,e^,... ,e'^}, where are the q standard basis 
vectors of A configuration of the model has the form u) = [ 0 : 1 , 002 , ■ ■ ■ ,‘^n) G A”. We will 
consider a configuration on a graph with n vertices and let Xfioo) = ooi be the spin at vertex i. 
The random variables W’s for i = 1,2,... ,n are independent and identically distributed with 
common distribution p. 


In this section, we illustrate the strength of our main result of section^ Theorem 6.15 
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For the generalized Curie-Weiss-Potts model, for r > 2, the interaction representation func¬ 
tion, defined in general in Q, has the form 

g 




i=i 

and the generalized Curie-Weiss-Potts model is defined as the Gibbs measure 

1 


(40) 


Pn,l3,r{B) = 


ZniP) 


'B 


exp{-/3n (L„(a;))}dPn 


where Ln{oj) is the empirical measure defined in (21). 


In [25], the authors proved that there exists a phase transition critical value (3c{q,r) such 
that in the parameter regime {q,r) G {2} x [2,4], the GCWP model undergoes a continuous, 
second-order, phase transition and for (q, r) in the complementary regime, the GCWP model 
undergoes a discontinuous, first-order, phase transition. This is stated in the following theorem. 

Theorem 7.1 (Generalized Ellis-Wang Theorem). Assume that q > 2 and r >2. Then there 
exists a critical temperature f3ciq,'i") > 0 such that in the weak limit 


lim Pn,l3,r{Ln ^ ') — 


T/g(l,...,l) 


if 13 < I3c{q,r) 


q ^u(l3,q,r)e'^+(l—u{l3,q,r))/q(l,...,l) 1-113^ f3c{q^’^) 

where u(/3, g, r) is the largest solution to the so-called mean-field equation 

1 — exp(A(u)) 


u = 


1 -h (g - 1) exp(A(M)) 


with A{u) := — [(l-|-(g—l)?x)^“^ —(1—u)^“^]. Moreover, for {q,r) G {2}x[2,4], the function 

j3 !-;■ u(/3, g, r) is continuous whereas, in the complementary case, the function is discontinuous 
at I3c{q,r). 


For the GCWP model, the function defined in general in (30) has the form 

= [SfcF] WVH{z)) = [4r] ifiz) = ^ 


ef^^l 


e/3.r'V.7. + e^4-'‘ 


For the remainder of this section, we will replace the notation H,fi and refer to g^’3(^z) = 
{gi’^{z),---,g^’^{z)) as simply g'^iz) = [gl{z),..., gq{z)). As we will prove next, the rapid 
mixing region for the GCWP model is defined by the following value. 


(41) 


/3s(g, r) := sup{/3 > 0 : gl{z) < Zk for all z G P such that Zk G (1/g, 1]} • 


Lemma 7.2. If ficiq^'r) is the critical value derived in [25| and defined in Theorem 7.1, then 


/3s{q,r) < /3ciq,r) 
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Proof. We will prove this lemma by contradiction. Suppose l3ciQ,r) < (3s{q,r). Then there exists 
j3 such that 

I3c{q,r) < P < /3s{q,r). 

Then, by Theorem 7.1, since Pciq,^) < (3, there exists u > 0 satisfying the following inequality 

1 - 


(42) 


u < 


l + {q- ’ 


where A{u) := —[(1 + (g — 3)uY ^ — (1 — uY . Here, the above inequality (42) rewrites 


as 


(43) 


= exp 


/3 


1 — u 


r—1 


1 + (g - l)u 


r—1 


< 


1 — U 


(g - l)u + 1' 


Next, we substitute A = (1 — u)Y^ hrto the above inequality (43), obtaining 


(44) 

Now, consider 


exp /3 


g-1 


r—1 


-(l-A) 


r—1 


< 


A 


(1-A)(g-1)- 


z = 1 - A, 


g-r 


g-1 


Observe that 21 = 1 — A = l — (1 — u)Y^ = — ^^ as tt > 0. Here, the inequality (44) 
can be consequently rewritten in terms of the above selected 2 as follows 

/3(i-Ar-i 

zi = l-X < - _i = gi(^). 

g/3(l-A)’'-i 

thus contradicting /3 < f3s{q.,r). Hence /3s{q,r) < I3c{q,r). □ 


Combining Theorem 7.1 and Lemma 7.2 yields that for parameter values (g, r) in the con¬ 
tinuous, second-order phase transition region /3s(g, r) = (3c{q,r), whereas in the discontinuous, 
first-order, phase transition region, /3s{q,r) is strictly less than f3c{q,r). This relationship be¬ 
tween the equilibrium transition critical value and the mixing time transition critical value was 
also proved for the mean-field Blume-Capel model discussed in section This appears to be a 
general distinguishing feature between models that exhibit the two distinct type of phase tran¬ 
sition. We now prove rapid mixing for the generalized Curie-Weiss-Potts model for f3 < /3s{q, r) 
using the aggregate path coupling method derived in section 

We state the lemmas that we prove below, and the main result for the Glauber dynamics of 
the generalized Curie-Weiss-Potts model, a Corollary to Theorem 6.15 Let /3s{q) be the mixing 
time critical value for the GCWP model defined in (41). 


Lemma 7.3. Condition 6.11 and Condition 6.12 are satisfied for all fi < fis{q)- 


Lemma 7.4. Condition 6.13 is satisfied for all fi < fis{q)- 

tmix = 0(n log n). 


Corollary 7.5. If fi < fisiq), then 
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Proof. Condition 6.12 and Condition 6.13 required for Theorem 6.15 are satisfied by Lemma 7.3 
and Lemma 17.41 □ 


Proof of Lemma 7^. Denote z' = {z'l,... ,z') = z — Zfj. Then by Taylor’s Theorem, we have 


lim sup 

z->-zp 


\\z-Zf}\\l 


(45) 


= lim sup ■ 

z^zp 


q 

E 

k=l 

Q r—1 

1 

f: " 




q 

E 

fc=i 


Zk--g 


= lim 

z'^0 


q 

E 

k=l 




g+0(CD"+-+«)2) 


q , , 

EI 4 I 

k=l 




r-2 


In dl, it was shown that for r = 2, /3 < (3s{q) < Pdq) < q- Therefore, the last expression 
above is less than 1 and we conclude that 


V \W{z) - gdzfi)\\i 
hmsup-n-- < 1- 


z^zp 


\z-zdV 


□ 


Proof of Lemma |7.3[ First, we prove that the family of straight lines connecting to the equi¬ 
librium point Zj 3 = {1/q,... ,1/q) is a neo-geodesic family as it was defined following Condition 


6.11 Specifically, for any 2 = (zi, Z 2 , ■ ■ ■ ,Zq) € V define the line path p connecting ^ to zp by 


(46) 


z(t) = -(1 — t) + zt, 0 <t < 1 

q 


Then, along this straight-line path p, the aggregate p-variation has the form 


D^p{z,zp) f (Vgl{y),dy'j 

k=li 


f\smm 


dt 


Next, for all A: = 1, 2,..., g and t G [0,1], denote 


z{t)k = 


Then 

(47) 


gim) = 


c,d{i^/q)P-L+^kt) 
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and 


(48) J^[gl{z{t))] = I3{r - l)gl{z{t)) - t) + Zkt^ (^Zk - - {z - Zf^,g^{z{t)))p 

where {z — zp,g^{z{t)))p is the weighted inner product 


{z - z,3,g^{z{t)))p := J29j{z{t)) (^Zk - Q(1 - t) + Zkt 


r-2 


Now, observe that for z{t) as in (46) with z / the inner product {{z — zp),g'^{z{t)))p is 
monotonically increasing in t since 


^{z - zp,g'^{z{t)))p > /3{r - 1) Var^r Q(1 - t) + zjt^ ^ 


> 0 


where Vargr(-) is the variance with respect to 5 ^. 

So {z — Zjs, g''{z{t)))p begins at {z — zis, g^{z{0)))p = {z — zpjZjs) = 0 and increases for all 
t G (0,1). 


The above monotonicity yields the following claim about the behavior of g^{z{t)) along the 
straight-line path p. 

(a) U Zk < 1/q, then gl{z{t)) is monotonically decreasing in t. 

(b) If Zk > l/q, then gl.{z{t)) has at most one critical point on (0,1). 

The above claim (a) follows immediately from ( |48[ ) as {z — zp,g'^{z{t)))p > 0 for t > 0. Claim 
(b) also follows from (48) as its right-hand side, Zk — ^jq > 0 and {z — zp,g'^{z{t))) p is increasing. 
Thus there is at most one point on (0,1) such that ^ [g'^(^(t))] = 0. 


Next, define 

Az = {k-. Zk> l/q} 

Then the aggregate 5 -variation can be split into 


Df,{z,zp) =Y.[ 




dt+'^ 




j^Wkizm 


dt 


For k ^ Az, claims (a) and (b) imply 




* = - Jt\sl(At))\ dt = 9 j( 2 ( 0 )) - gl(z{l)) = i - giiz) 


For k ^ Az, let tk = max{t^, 1} ,where is defined in (b). Then, we have 


4|9l(4*))l 


dt^ /‘“ 4 [ 9 j( 2 (*))|di- [Ajgl(z{t))]dt^2gl{z{tl))-gUz) 


/o 


f* dt 


1 

q 
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Combining the previous two displays, we get 


fceA ^ 

= •2Y.{sUAti))-\ 


k^A 


keA 


Since f3 < f3s and k G Az, we have 

9liz{tk)) < z{tl)k < z{l)k = Zk 

and we conclude that 


Thus 


D3{z,zi3) < 2 ^ = \\z-Zfj\\i 

k&A ^ 

dg{z,zp) D${z,zp) . 

^ -fp < 1 lor all z 7 ^ Z /3 in V. 


Z-Zfl 1 Z-Zfl 1 


Next, since we are dealing with the straight line segments p, 

D${z,zp) \\g{z) - g{zp)\\i 

lim sup ^ = hm sup-^^- < 1 

\\z-Zp\\l Z^zp \\z-zp\\l 


by (45), the Mean Value Theorem, and H{z) G . This, in turn, guarantees the continuity 

dg{z,zp) 


required for Condition 6.11 


lim sup 


<1 


z—^'Zp \\z Zp\\\ Z^zp 11 ^ Zp\\\ 


Thus Condition 6.11 is proved for the CWP model. Moreover this proves that the family 


of straight line segments p is a neo-geodesic family (see definition following Condition 6.11) 
Indeed, there is 5 G (0,1) such that 

p : zit) = -il — t) + zt, z gv\ is a NG 5 family of smooth curves, 

9 J 

i.e. Vz / zp in P, and corresponding p : z{t) = ^(1 — t) + zt, 

Dp{z,zp) 


\z-zp\\ 


1 


<1-5/2 


Since the family of straight line segments p is a neo-geodesic family NG^, the integrals 

9 


D’l,{x,z) :='^ f (Vgl{y),dy^ 
k=l i 


can be uniformly approximated by the corresponding Riemann sums of small enough step size 
by the Mean Value Theorem as H{z) G and therefore each Pkiz) G . That is, there exists 
a constant C > 0 that depends on the second partial derivatives of g'^{z) = [g\{z),... ,gg{z)), 
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such that for e > 0 small enough, the curve p = ^{1 — t) -\- zt m. the family NG 5 that connects 
zp to 2 satisfies 


< Cre^ Vz G P s.t. \\z — zp\\i > e 


q r 

- Zi-i,V gl{zi-i)^ -D3{z,zp) 

k=li=l 

for a sequence of points zq = zp, zi,..., Zr = z G V interpolating p such that 

e < \\zi — < 2 £ for i = 1 , 2 ,..., r. 

Hence 


EE (zi-Zi-i,Vgl{zi-i)\ 
k=li=l ' ' 


- zph 


< 1 - 6/2 + Ce < 1 - 6/3 


for e < 6 /{6C). This concludes the proof of Condition 6.12 


□ 
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